What Makes a Hollywood Movie a Hit or a Flop?

Final Project
Data Science 1 with R (STAT 301-1)

Author

Celena Kim

Published

December 6, 2023

Introduction

As an avid movie lover, I have always been curious about what factors play into making some Hollywood movies critically acclaimed blockbusters while others fade into the background. Beyond solely the opening weekend numbers, I am interested in exploring the interplay between more extensive variables that contribute to a film’s propensity to ultimately be a hit or a flop. Specifically, I think it would be very interesting to focus on the five factors of critic and audience ratings, opening weekend revenue, gross (domestic, foreign, and worldwide), budget and budget recovery, and Oscar wins. I also am curious to see whether time of year/seasons have an impact on a movie’s success, and if there is a particular season in which the most successful movies are released. By focusing on these main variables for my analysis, I hope to explore my research question by discovering patterns and compelling correlations between the variables on a range of univariate to multivariate levels. I am interested in exploring whether certain variables affect another and how certain variables work together to contribute to a movie’s overall success rate. In order to carry out this analysis, I will be utilizing a data set found on the Kaggle website called “Hollywood Hits and Flops (2007 - 2023)”, described in the next section.

Data Overview and Quality

text

there were many variables not conducive to perform an analysis on, such as being character type vars and the oscar not being a bool

Explorations: What variables contribute to a movie’s overall success rate? How do these variables interact with eachother?

Variable 1: Ratings

Within this dataset, the 3 main movie rating measures are the Rotten Tomatoes score (audience and critic), Metacritic score (audience and critic), and IMDb rating.

Figure 1: A visualization of the change in audience and critic ratings of Hollywood movies from 2007-2022.

Figure 1 visualizes how the average of Rotten Tomatoes and Metacritic scores have changed over the years, separated by audience and critic rating groups. Overall, it appears that these ratings as a whole have slightly increased since 2007. There are many different factors that could have led to this gradual increase, perhaps that the quality of movies has improved over time, or that reviewers have become more lenient in their scores. Additionally, the audience rating group seems to consistently give higher ratings than the critic rating group, possibly suggesting that they are less harsh when it comes to reviewing films. This analysis of ratings over the years serves to help us understand how much the behaviors of the two rating groups of audience and critics differ, as well as visualizing the overall pattern of critic ratings over the years.

Figure 2: The distribution of the average of Rotten Tomatoes critic ratings and Metacritic critic ratings for movies that have won at least one Oscar and movies who have not won any Oscars.

Figure 2 makes use of two measures of movie success that are determined solely by movie critics: Oscar wins and average critic movie ratings. As can be seen in the box plot, Hollywood movies that have won at least one Oscar award have a higher average of Rotten Tomatoes and Metacritic critic ratings than those who have not won any Oscars. This correlations suggests a similar pattern between critics’ movie assessment and award recognition, in that movies who are praised enough by critics to win an Oscar are also favored highly among Rotten Tomatoes and Metacritic critics.

Figure 3: There is a slight positive association between Rotten Tomatoes critic rating and opening weekend revenue, but it is not very strong.

A movie’s Rotten Tomatoes critic rating is typically released before the movie hits theaters. Thus, I was interested in exploring the extent to which the success of this rating has on influencing the success of the movie’s opening weekend revenue. Figure 3 shows, however, that the correlation between these two variables is not very strong. There is a very slight positive association, suggesting that to some extent, as a movie’s Rotten Tomatoes critic rating increases, so does its opening weekend earnings. But, as this association is very weak, this means that the Rotten Tomatoes critic score does not have a drastic/direct impact on opening weekend revenue.

Figure 4: The average IMDb, Metacritic, and Rotten Tomatoes critic ratings for each unique script type combination of Hollywood movies.

Figure 4 visualizes the average IMDb, Metacritic, and Rotten Tomatoes critic ratings for each of the unique script type combinations of Hollywood movies from 2007-2022. One chief idea to note is that the average IMDb rating is only available for 5 out of the 16 script types, revealing a great amount of missingness within this variable and making it difficult to reach a conclusion about the relationship between script type and average IMDb rating. For the other two rating variables, the script type with both the highest Metacritic and Rotten Tomatoes critic ratings is “documentary”, suggesting that this script type is more favorable among critics than other script types.

Figure 5: The distribution of the deviance of audience vs. critic movie ratings, by genre.

In this data set, “audience vs. critics deviance” refers to the difference in a movie’s average of Rotten Tomatoes and Metacritics critic ratings and its average of Rotten Tomatoes and Metacritics audience ratings. A negative deviance means there was a higher audience rating than critic rating, and vice-versa. With a majority of the bars on the plots in the negative axis, this means that for a majority of genres, audience rating groups gave out higher ratings than critic rating groups Faceting by genre, Figure 5 shows that the genre with the lowest absolute value deviance rating is the “biography” category. This suggests that audience and critic rating groups rated movies of this genre most similarly. On the other hand, the genre with the highest disparity in ratings is the “sci-fi” category, suggesting that audiences and critics differ in rating behaviors the most for this genre.

Figure 6: The distribution of the deviance of Rotten Tomatoes vs. Metacritic audience group movie ratings, by genre.

In Figure 6, the deviances in Rotten Tomatoes audience ratings and Metacritic audience ratings is explored by genre. A positive deviance means there was a higher average Rotten Tomatoes audience score than Metacritic score, and vice-versa. With all of the bars in this graph in the positive axis except for the fantasy genre, this suggests that Rotten Tomatoes audience users gave out higher ratings for movies than the Metacritic audience rating group in all genres except fantasy. Moreover, Figure 6 interestingly displays the exact opposite findings of the lowest and highest deviances that were conluded in Figure 5. This time, the genre with the lowest absolute value deviance rating is the “sci-fi” category, suggesting that Rotten Tomatoes and Metacritic audiences rated movies in this category most similarly, and the genre that the Rotten Tomatoes and Metacritic audiences differed in ratings the highest in is the “biography” category. With this repetition in extreme deviances for the “sci-fi” and “genre” categories from both Figure 6 and Figure 5, this could suggest that these two categories are the two genres that are the most varied in opinion.

Figure 7: A visualization of the relationship between audience and criticing ratings with domestic gross for each movie genre category.

Figure 7 seeks to explore the relationship between both critic groups’ ratings and domestic gross, as well as how the genre variable plays into this relationship. Overall, there seems to be a positive correlation between a movie’s rating and its domestic gross earnings. That is, as the rating of a movie increases, its domestic gross revenue also increases. However, this correlation seems to be stronger/steeper for audience rating groups than critic rating groups, suggesting that as audience ratings increases, the domestic gross earnings increases at a higher rate than it would with critic ratings. When looking at this relationship through the lens of the different genres, the “action” genre has the steepest correlation for both plots, but it is again steeper for audience rating. This suggests that for action type movies specifically, as their movie rating increases, the amount of domestic gross revenue earned for this type of category is greater than other movie genres. However, this domestic gross earning is greater for audience ratings as they increase for action movies, as compared to critic ratings of action movies.

Figure 8: A visualization of the movie distributors with the ten highest critic ratings and and ten highest average audience ratings.

Figure 8 displays the movie distributors with the highest average Rotten Tomatoes and Metacritic ratings, for both audience and critic rating groups. The movie distributing company with the highest average critic rating is “A24”, and the highest average audience rating was received b “Atlas Distribution Company”. This could suggest that movies released by A24 were favored the most among critics, and movies released by Atlas Distribution Company were favored the most among audiences.

Figure 9: A comparison of the relationships between critic and audiences ratings with the percent that a movie’s revenue is earned from abroad audiences.

In Figure 9, the potential relationship of the audience and critic rating groups with the percent of a movie’s gross earnings that are earned abroad is explored. However, we see that there is virtually no correlation between the variables, although there is the slighest positive correlation between audience ratings and the percent of a movie’s gross earned abroad. This could suggest that critic ratings have practically no link with the percent that a movie earns in its theaters abroad, but the audience ratings do seem to have a slightly stronger correlation with the percent earned abroad variable, suggesting that audience ratings may have the slightest connection to this variable, but it is not to a significant degree.

Variable 2: Opening Weekend Revenue

A movie’s opening weekend revenue refers to the total box office earnings that the film earned during its first weekend of release in theaters.

Figure 10: A visualization of the change in yearly average opening weekend revenue for Hollywood movies from 2007-2022.

Figure 10 visualizes the change in the mean opening weekend earnings (in millions) for Hollywood movies from 2007-2022. As can be seen by the graph, there are two distinct low points on the graph corresponding to the years 2008 and 2020, and these drops can be explained by the economic state of the country during those years. In 2008, the country experienced a Great Recession of economic downturn, greatly impacting the film industry. This economic crisis led to a dramatic decline in consumer spending and movie production, possibly leading to the drop in mean opening weekend earnings that we see in the graph for this year. In 2020, we see a significantly more drastic drop in mean opening weekend revenue, as the COVID-19 pandemic led to a nationwide shut down/capacity limit of movie theaters. With these conditions, there was a dramatic decline in movie theater ticket sales and thus a dramtic drop in the mean opening weekend revenue of movies released during the pandemic, as shown in the graph. These findings are certainly something to keep in mind throughout this variable analysis, as the opening weekend revenue is highly impacted by economic crises such as the 2008 Great Recession and the 2020 COVID-19 pandemic.

Figure 11: The relationship between a movie’s earnings during the first weekend of its release and its overall budget recovery earnings, compared across genres and script types.

Figure 11 explores the relationship between a movie’s opening weekend revenue and how much it earns to recover it’s production cost (budget recovery), categorized by script type and genre. Overall, there is a clear strong, positive correlation between opening weekend revenue and budget recovery, suggesting that as the amount of money a movie earns during the first weekend of its release in theaters increases, the the amount of money it will earn to recover its budget will also increase. However, this correlation varies between each specific genre and script type. In examining genre, the ‘adventure’ category has the highest correlation. This may suggest that out of all movie genres, the adventure category earns more of its budget back as their opening weekend increases. In examining script type relationships, the ‘remake’ script type has the highest correlation, also suggesting that remakes earn a higher amount to recover its budget as their opening weekend earnings increase.

Figure 12: The genres and script types of the top 5 movies that earned the most revenue during their opening weekends.

Figure 12 displays that the genre combination that earned the greatest average revenue during its opening weekend of release is sci-fi & fantasy, and the script type combination that earned the greatest average revenue during its opening weekend of release is sequel & adaptation. This suggests that the movies categorized as a sci-fi fantasy genre hybrid earned more during the first weekend of their release than other genre combinations, and movies categorized as a sequel adaptation script type hybrid also earned that title.

Figure 13: The disribution of opening weekend revenue for movies that have won at least one Oscar award and movies who have won 0 Oscars.

Figure 13 shows that Hollywood movies that have won at least one Oscar award or greater have an average opening weekend revenue that is actually less than movies that have not won any Oscars. This could suggest that the mean opening weekend success of a movie does not correlate with winning an Oscar, and these two variables are unrelated to one another. In other words, having a high opening weekend revenue may not increase a movie’s chance of winning an Oscar.

Figure 14: The relationship between a Hollywood movie’s Rotten Tomatoes critic score and its earnings during the first weekend of its release.

Figure 14 displays very strong, positive correlations for both associations of domestic gross by opening weekend revenue and foreign gross by opening weekend revenue. This suggests that a Hollywood movie’s performance during its opening weekend of release has a direct positive association with its overall domestic and foreign grosses. That is, as opening weekend earnings success increases, so will domestic and foreign gross successes. Additionally, the correlation between opening weekend revenue and domestic gross seems to be slightly steeper than the correlation between opening weekend revenue and foreign gross, suggesting that opening weekend revenue performance has a slightly greater impact on its domestic gross performance than it does its foreign gross performance.

Figure 15: The top 10 most successful movie distribution companies in terms of average opening weekend revenue success.

In examining the top distribution companies based on opening weekend performance, Figure 15 displays that the “Walt Disney Studios” movie distribution company has earned a mean opening weekend revenue that is significantly greater than other movie distributors. This suggests that movies that are released by this company have earned revenue during their opening weekends of being in theaters at a rate significantly greater than movies released by other companies.

Variable 3: Domestic, Foreign, & Worldwide Gross

Figure 16: A visualization of the change in yearly domestic gross for Hollywood movies from 2007-2022.

Figure 16 visualizes the change in the yearly average domestic gross (in millions) for Hollywood movies from 2007-2022. Just as in Figure 10, there are significant drops for the years 2008 and 2020, also due to the economy of the country during those years. With the 2008 Great Recession, declines in consumer spending due to the economic downturn directly impacted the total box office revenue of movies. With the 2020 COVID-19 pandemic, quarantining and the closing of movie theaters also led to declines in consumer spending and a direct decline in gross domestic revenue for movies. Like the opening weekend revenue variable, the domestic gross variable is heavily impacted by economic crises such as the 2008 Great Recession and the 2020 COVID-19 pandemic.

Figure 17: A visualization of the relationship between a Hollywood movie’s domestic and foreign gross.

Figure 17 displays a direct and strong positive correlation between the domestic gross earnings and foreign gross earnings of Hollywood movies. In other words, as the domestic gross earnings of a movie increases, its foreign gross earnings also increase. This suggests that US and foreign audiences have similar preferences in movie popularity.

Figure 18: The distributions of domestic and foreign gross for each genre of Hollywood movies.

Figure 18 seeks to explore another comparison of movie preference behavior between domestic and foreign audiences, this time by comparing gross performance among movie genres. In determining the most popular genres by highest average gross revenue between the two audiences, the “sci-fi” category has the best domestic performance, while the “action” and “adventure” categories are tied for the best foreign performance. This suggests that there is a difference in movie genre popularity between the two audiences, in that US movie audiences have a high preference for sci-fi category movies, while foreign movie audiences have a high preference for action and adventure movies. A sci-fi movie may perform better in the US than compared to foreign movie theaters, and action and adventure movies may perform better in foreign movie theaters.

Figure 19: A comparison of the top 5 movie distributors with the highest gross earnings between domestic and foreign gross.

As a final comparison of movie preference behavior between domestic and foreign audiences, Figure 19 explores the movie distributors with the top 5 highest average domestic and foreign gross revenues. For both US and foreign audiences, the movie distributor with the most successful gross performance is Walt Disney Studios. This reveals a similarity between domestic and foreign audiences in that movies distributed by Walt Disney Studios are more popular (generate more gross revenue) than movies released by other distributors.

Figure 20: A visualization of the relationship between a Hollywood movie’s worldwide gross revenue and the percent of its production budget that was recovered.

In Figure 20, there is a clear positive relationship between a movie’s worldwide gross earnings and the percent of the its budget that is recovered. This suggests that the greater box office revenue a movie earns, the more of its budget will be able to be earned back following its production/release into theaters.

Variable 4: Budget & Budget Recovery

Figure 21: A visualization of the change in average production budgets for Hollywood movies from 2007-2022.

Figure 21 follows the same patterns as Figure 10 and Figure 16, showing that the variable of movie budget is also highly impacted by economic crises. In this graph, there are also two distinct low points corresponding to the years 2008 and 2020. With the 2008 Great Recession, financial challenges could have resulted in cost-cutting measures and a more stringent approach to budgeting for movie distributors, leading to a lower average movie budget for that year. With the 2020 COVID-19 pandemic and quarantine, film studios may have altered their production strategies of their movies by delaying the start of filmmaking, leading to an overall decline in film production and thus a decline in mean budgets for that year. From these three similar variable findings, there seems to be a common trend that a movie’s success is greatly impacted by the economy.

Figure 22: The correlation between a Hollywood movie’s production budget and the variables of opening weekend revenue and worldwide gross revenue.

In Figure 22, there is a clear positive association between a Hollywood movie’s budget and its earnings both during its opening weekend of release and overall earnings worldwide. This suggests that, on average, movies with higher production budgets tend to achieve greater box office revenue success. It can be concluded that movie budget is closely related to the variables of opening weekend revenue and worldwide gross, in that as the budget of movies increases, its opening weekend revenue earnings and worldwide gross revenue earnings also increase.

Figure 23: The distribution of movie budgets for each genre category.

Figure 23 visualizes the distribution of movie production budgets for each of the genre categories, with the fantasy genre having the highest average budget. This could be due to the fact that the production of fantasy movies usually involves elaborate visual effects, intricate makeup/costumes, computer-generated imagery (CGI), and other advanced technologies to create mythical worlds and landscapes, thus requiring substantial financial investment in technology, skilled artists, and post-production processes that contribute to an overall high average budget.

Figure 24: The distribution of budget for movies that have won at lease one Oscar award and movies who have won 0 Oscars.

Similar to Figure 13, Figure 24 shows that Hollywood movies that have won at least one Oscar award have an average production that is actually less than movies with no Oscar wins. This could suggest that having a high production budget does not relate to or increase the chances of a movie winning an Oscar, and that having a high production budget may not be a factor taken into account when voting for Oscars.

Figure 25: The associations of movie budget with 3 movie rating measures of the average of Rotten Tomatoes and Metacritic critic ratings, the average of Rotten Tomatoes and Metacritic audience ratings, and IMDb ratings.

Figure 25 seeks to explore how a movie’s production budget is correlated with three rating measures: the average of Rotten Tomatoes and Metacritic critic scores, the average of Rotten Tomatoes and Metacritic audience scores, and IMDb ratings. For all three graphs, there seems to be very weak positive correlations as the data points are very spread out from each other. This could suggest that there is a slight tendency for movies with higher budgets to receive slightly higher ratings, but the relationship is not very strong, and movie budget is not a direct determinant of rating success.

Variable 5: Oscar Wins

Figure 26: The top 5 movies with the most Oscar wins, by genre and script type.

Figure 26 displays that the genre combination with the most Oscar wins is “biography, history”, and the script type with the most Oscar wins is “original screenplay”. This suggests that the movies categorized as “biography, history” or “original screenplay” are more successful among Oscar voters.

Figure 27: The disribution of worldwide gross revenue for movies that have won at least one Oscar award and movies who have won 0 Oscars.

Figure 27 shows that Hollywood movies that have won at least one Oscar award have an average of worldwide gross earnings that is greater than movies that have not won any Oscars. This could suggest a link between these two variables in that movies that have won an Oscar also have a better worldwide box office revenue performance than movies that have not won any Oscars.

Variable 6: Seasonal Release Date

These analyses seek to explore how the five main variables above vary/are impacted by the season a movie is released in, and what seasonal release date trends may exist in influencing a movie’s success rate.

Figure 28: A comparison of the mean ratings of movies based on the season they were released in, between the critic and audience rating groups.

Figure 28 shows a comparison between the average ratings for each season between the critic and audience rating groups. There appears to be a similar pattern for both rating groups’ seasonal average critic numbers, with the highest ratings given for movies released in the Fall, and the lowest ratings given for movies released in the Winter. This reveals a similarity in the seasonal patterns of movie ratings for the two rating groups. However, the taller bar graphs in the plot on the right depict a disparity between the two groups’ rating patterns in that the audience rating group gives out higher ratings than the critic rating group, as revealed in Figure 1. Figure 28 stands to visualize a way in which the rating patterns for these two groups are similar, and confirm a previous finding of a way that their patterns differ. An overall conclusion can be made that movies released in the Fall have the highest ratings, while movies released in the Winter have the lowest ratings.

Figure 29: A comparison of the average opening weekend revenues in millions of dollars for movies based on what season they were relased in.

In Figure 29, it is clear that movies with the highest average opening weekend revenue were released in the Spring. This could suggest that movies that are released in the Spring are more successful in terms of generating more earnings during their first weekend in theaters than movies released in other seasons.

Figure 30: The average total revenue generated by films from all sources globally by the season the film was released in.

Figure 30 shows that movies released during the Summer months have the highest average worldwide gross. This could be due to the fact that in many countries around the world, kids are on summer vacation during these months, and thus families are more likely to go to the movies and contribute to increased ticket sales.

Figure 31: A comparison of the average production budget of movies based on what season they were released in.

Figure 31 shows that movies released in the Spring have the highest average movie budgets. This directly aligns with previous findings in the EDA. In Figure 22, it was concluded that there exists a positive association between a Hollywood movie’s budget and its opening weeked earnings. Therefore, since Figure 29 revealed that the season of movies released with the highest average opening weekend revenue was Spring, then the season of movies released with the highest average movie budgets should also be the Spring, and that is what we see in this plot. This supports our finding of the positive correlation that exists between a movie’s budget and opening weekend revenue.

Figure 32: The distribution of Oscar wins based on what season the movie was released in.

In Figure 32, movies that were released in the Fall season won significantly more Oscars than movies released in other seasons. This is due to the fact that the Fall season is close to around the time when Oscar voting starts, and thus these films are more salient/relavent among the voters, but there is still enough time away from the start of voting for the films to gain enough popularity and traction before the awards are given out. From this, we can conclude that when defining a film’s success solely defined by the number of Oscar wins, releasing the film during the Fall season will greatly increase its chances of being successful.

Conclusion

text

References

text

Appendix: technical info

text

Appendix: extra explorations

Figure 33: A visualization of the genres present in each category of movie script type.

movie with highest -average critic rating -average audience rting -opening weekend -domestic gross -foreign gross -ww gross -budget -budget recovered -oscar wins -imdb rating